<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Chapter&nbsp;5.&nbsp;Text Tables</title>
<link href="../docbook.css" type="text/css" rel="stylesheet">
<meta content="DocBook XSL-NS Stylesheets V1.76.1" name="generator">
<meta name="keywords" content="HyperSQL, HSQLDB, SQL, Text, Tables">
<meta name="keywords" content="Hsqldb, HyperSQL, Database, JDBC, Java">
<link rel="home" href="index.html" title="HyperSQL User Guide">
<link rel="up" href="index.html" title="HyperSQL User Guide">
<link rel="prev" href="databaseobjects-chapt.html" title="Chapter&nbsp;4.&nbsp;Schemas and Database Objects">
<link rel="next" href="accesscontrol-chapt.html" title="Chapter&nbsp;6.&nbsp;Access Control">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<div class="navheader">
<table summary="Navigation header" width="100%">
<tr>
<td align="left" width="30%"><a accesskey="p" href="databaseobjects-chapt.html"><img src="../images/db/prev.png" alt="Prev"></a>&nbsp;</td><td align="center" width="40%" style="font-weight:bold;">Chapter&nbsp;5.&nbsp;Text Tables</td><td align="right" width="30%">&nbsp;<a accesskey="n" href="accesscontrol-chapt.html"><img src="../images/db/next.png" alt="Next"></a></td>
</tr>
<tr>
<td valign="top" align="left" width="30%">Chapter&nbsp;4.&nbsp;Schemas and Database Objects&nbsp;</td><td align="center" width="40%"><a accesskey="h" href="index.html"><img src="../images/db/home.png" alt="Home"></a></td><td valign="top" align="right" width="30%">&nbsp;Chapter&nbsp;6.&nbsp;Access Control</td>
</tr>
</table>
</div>
<HR>
<div class="chapter" title="Chapter&nbsp;5.&nbsp;Text Tables">
<div class="titlepage">
<div>
<div>
<h2 class="title">
<a name="texttables-chapt"></a>Chapter&nbsp;5.&nbsp;Text Tables</h2>
</div>
<div>
<h3 class="subtitle">
<i>Text Tables as a Standard Feature of Hsqldb</i>
</h3>
</div>
<div>
<div class="authorgroup">
<div class="author">
<h3 class="author">
<span class="firstname">Bob</span> <span class="surname">Preston</span>
</h3>
<div class="affiliation">
<span class="orgname">The HSQL Development Group<br>
</span>
</div>
</div>
<div class="author">
<h3 class="author">
<span class="firstname">Fred</span> <span class="surname">Toussi</span>
</h3>
<div class="affiliation">
<span class="orgname">The HSQL Development Group<br>
</span>
</div>
</div>
</div>
</div>
<div>
<p class="releaseinfo">$Revision: 4903 $</p>
</div>
<div>
<div class="legalnotice" title="Legal Notice">
<a name="N119E3"></a>
<p>Copyright 2002-2012 Bob Preston and Fred Toussi. Permission is
      granted to distribute this document without any alteration under the
      terms of the HSQLDB license. Additional permission is granted to the
      HSQL Development Group to distribute this document with or without
      alterations under the terms of the HSQLDB license.</p>
</div>
</div>
<div>
<p class="pubdate">2012-01-22 11:32:56-0500</p>
</div>
</div>
</div>
<div class="toc">
<p>
<b>Table of Contents</b>
</p>
<dl>
<dt>
<span class="section"><a href="texttables-chapt.html#ttc_overview">Overview</a></span>
</dt>
<dt>
<span class="section"><a href="texttables-chapt.html#ttc_implementation">The Implementation</a></span>
</dt>
<dd>
<dl>
<dt>
<span class="section"><a href="texttables-chapt.html#ttc_table_definition">Definition of Tables</a></span>
</dt>
<dt>
<span class="section"><a href="texttables-chapt.html#ttc_scope">Scope and Reassignment</a></span>
</dt>
<dt>
<span class="section"><a href="texttables-chapt.html#ttc_nulls">Null Values in Columns of Text Tables</a></span>
</dt>
<dt>
<span class="section"><a href="texttables-chapt.html#ttc_configuration">Configuration</a></span>
</dt>
<dt>
<span class="section"><a href="texttables-chapt.html#ttc_disconnect">Disconnecting Text Tables</a></span>
</dt>
</dl>
</dd>
<dt>
<span class="section"><a href="texttables-chapt.html#ttc_issues">Text File Usage</a></span>
</dt>
<dt>
<span class="section"><a href="texttables-chapt.html#ttc_global_props">Text File Global Properties</a></span>
</dt>
<dt>
<span class="section"><a href="texttables-chapt.html#trc_transactions">Transactions</a></span>
</dt>
</dl>
</div>
<div class="section" title="Overview">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both">
<a name="ttc_overview"></a>Overview</h2>
</div>
</div>
</div>
<p>Text Table support for HSQLDB was originally developed by Bob
    Preston independently from the Project. Subsequently Bob joined the
    Project and incorporated this feature into version 1.7.0, with a number of
    enhancements, especially the use of conventional SQL commands for
    specifying the files used for Text Tables.</p>
<p>In a nutshell, Text Tables are CSV or other delimited files treated
    as SQL tables. Any ordinary CSV or other delimited file can be used. The
    full range of SQL queries can be performed on these files, including
    SELECT, INSERT, UPDATE and DELETE. Indexes and unique constraints can be
    set up, and foreign key constraints can be used to enforce referential
    integrity between Text Tables themselves or with conventional
    tables.</p>
<p>The delimited file can be created by the engine, or an existing file
    can be used.</p>
<p>HyperSQL with Text Table support is the only comprehensive solution
    that employs the power of SQL and the universal reach of JDBC to handle
    data stored in text files.</p>
</div>
<div class="section" title="The Implementation">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both">
<a name="ttc_implementation"></a>The Implementation</h2>
</div>
</div>
</div>
<div class="section" title="Definition of Tables">
<div class="titlepage">
<div>
<div>
<h3 class="title">
<a name="ttc_table_definition"></a>Definition of Tables</h3>
</div>
</div>
</div>
<p>Text Tables are defined similarly to conventional tables with the
      added TEXT keyword:</p>
<pre class="programlisting">    CREATE TEXT TABLE &lt;tablename&gt; (&lt;column definition&gt; [&lt;constraint definition&gt;])</pre>
<p>The table is at first empty and cannot be written to. An
      additional SET command specifies the file and the separator character
      that the Text table uses:</p>
<pre class="programlisting">   SET TABLE &lt;tablename&gt; SOURCE &lt;quoted_filename_and_options&gt; [DESC]</pre>
</div>
<div class="section" title="Scope and Reassignment">
<div class="titlepage">
<div>
<div>
<h3 class="title">
<a name="ttc_scope"></a>Scope and Reassignment</h3>
</div>
</div>
</div>
<div class="itemizedlist">
<ul class="itemizedlist" type="disc">
<li class="listitem">
<p>A Text table without a file assigned to it is READ ONLY and
          EMPTY.</p>
</li>
<li class="listitem">
<p>Reassigning a Text Table definition to a new file has
          implications in the following areas:</p>
<div class="orderedlist">
<ol class="orderedlist" type="1">
<li class="listitem">
<p>The user is required to be an administrator.</p>
</li>
<li class="listitem">
<p>Existing transactions are committed at this point.</p>
</li>
<li class="listitem">
<p>Constraints, including foreign keys referencing this
              table, are kept intact. It is the responsibility of the
              administrator to ensure their integrity.</p>
</li>
</ol>
</div>
<p>The new source file is scanned and indexes are built when it
          is assigned to the table. At this point any violation of NOT NULL,
          UNIQUE or PRIMARY KEY constraints are caught and the assignment is
          aborted. However, foreign key constraints are not checked at the
          time of assignment or reassignment of the source file.</p>
</li>
</ul>
</div>
</div>
<div class="section" title="Null Values in Columns of Text Tables">
<div class="titlepage">
<div>
<div>
<h3 class="title">
<a name="ttc_nulls"></a>Null Values in Columns of Text Tables</h3>
</div>
</div>
</div>
<div class="itemizedlist">
<ul class="itemizedlist" type="disc">
<li class="listitem">
<p>Empty fields are treated as NULL. These are fields where there
          is nothing or just spaces between the separators.</p>
</li>
<li class="listitem">
<p>Quoted empty strings are treated as empty strings.</p>
</li>
</ul>
</div>
</div>
<div class="section" title="Configuration">
<div class="titlepage">
<div>
<div>
<h3 class="title">
<a name="ttc_configuration"></a>Configuration</h3>
</div>
</div>
</div>
<p>The default field separator is a comma (,). A different field
      separator can be specified within the SET TABLE SOURCE statement. For
      example, to change the field separator for the table mytable to a
      vertical bar, place the following in the SET TABLE SOURCE statement, for
      example:</p>
<div class="informalexample">
<pre class="programlisting">    SET TABLE mytable SOURCE "myfile;fs=|"</pre>
</div>
<p>Since HSQLDB treats CHAR and VARCHAR strings the same, the ability
      to assign a different separator to the latter is provided. When a
      different separator is assigned to a VARCHAR, it will terminate any CSV
      field of that type. For example, if the first field is CHAR, and the
      second field VARCHAR, and the separator fs has been defined as the pipe
      (|) and vs as the period (.) then the data in the CSV file for a row
      will look like:</p>
<pre class="screen">    First field data|Second field data.Third field data</pre>
<p>This facility in effect offers an extra, special separator which
      can be used in addition to the global separator. The following example
      shows how to change the default separator to the pipe (|), VARCHAR
      separator to the period (.) within a SET TABLE SOURCE statement:</p>
<div class="informalexample">
<pre class="programlisting">    SET TABLE mytable SOURCE "myfile;fs=|;vs=."</pre>
</div>
<p>HSQLDB also recognises the following special indicators for
      separators:</p>
<div class="variablelist" title="special indicators for separators">
<p class="title">
<b>special indicators for separators</b>
</p>
<table border="0">
<col valign="top" align="left">
<tbody>
<tr>
<td>
<p>
<span class="term">\semi</span>
</p>
</td><td>
<p>semicolon</p>
</td>
</tr>
<tr>
<td>
<p>
<span class="term">\quote</span>
</p>
</td><td>
<p>single-quote</p>
</td>
</tr>
<tr>
<td>
<p>
<span class="term">\space</span>
</p>
</td><td>
<p>space character</p>
</td>
</tr>
<tr>
<td>
<p>
<span class="term">\apos</span>
</p>
</td><td>
<p>apostrophe</p>
</td>
</tr>
<tr>
<td>
<p>
<span class="term">\n</span>
</p>
</td><td>
<p>newline - Used as an end anchor (like $ in regular
            expressions)</p>
</td>
</tr>
<tr>
<td>
<p>
<span class="term">\r</span>
</p>
</td><td>
<p>carriage return</p>
</td>
</tr>
<tr>
<td>
<p>
<span class="term">\t</span>
</p>
</td><td>
<p>tab</p>
</td>
</tr>
<tr>
<td>
<p>
<span class="term">\\</span>
</p>
</td><td>
<p>backslash</p>
</td>
</tr>
<tr>
<td>
<p>
<span class="term">\u####</span>
</p>
</td><td>
<p>a Unicode character specified in hexadecimal</p>
</td>
</tr>
</tbody>
</table>
</div>
<p>Furthermore, HSQLDB provides csv file support with three
      additional boolean options: <code class="varname">ignore_first</code>,
      <code class="varname">quoted</code> and <code class="varname">all_quoted</code>. The
      <code class="varname">ignore_first</code> option (default false) tells HSQLDB to
      ignore the first line in a file. This option is used when the first line
      of the file contains column headings. The <code class="varname">all_quoted</code>
      option (default false) tells the program that it should use quotes
      around all character fields when writing to the source file. The
      <code class="varname">quoted</code> option (default true) uses quotes only when
      necessary to distinguish a field that contains the separator character.
      It can be set to false to prevent the use of quoting altogether and
      treat quote characters as normal characters. These options may be
      specified within the <code class="literal">SET TABLE SOURCE</code>
      statement:</p>
<pre class="programlisting">    SET TABLE mytable SOURCE "myfile;ignore_first=true;all_quoted=true"</pre>
<p>When the default options <code class="literal">all_quoted=</code>
      <code class="literal">false</code> and <code class="literal">quoted=true</code> are in
      force, fields that are written to a line of the csv file will be quoted
      only if they contain the separator or the quote character. The quote
      character is doubled when used inside a string. When
      <code class="literal">all_quoted=false</code> and <code class="literal">quoted=false</code>
      the quote character is not doubled. With this option, it is not possible
      to insert any string containing the separator into the table, as it
      would become impossible to distinguish from a separator. While reading
      an existing data source file, the program treats each individual field
      separately. It determines that a field is quoted only if the first
      character is the quote character. It interprets the rest of the field on
      this basis.</p>
<p>The character encoding for the source file is<code class="literal"> ASCII
      </code>by default. To support UNICODE or source files prepared with
      different encodings this can be changed to <code class="literal">UTF-8</code> or
      any other encoding. The default is <code class="literal">encoding=ASCII </code>and
      the option <code class="literal">encoding=UTF-8</code> or other supported
      encodings can be used.</p>
<p>Finally, HSQLDB provides the ability to read a text file as READ
      ONLY, by placing the keyword "DESC" at the end of the SET TABLE SOURCE
      statement:</p>
<pre class="programlisting">    SET TABLE mytable SOURCE "myfile" DESC</pre>
<p>Text table source files are cached in memory. The maximum number
      of rows of data that are in memory at any time is controlled by the
      <code class="varname">cache_rows</code> property. The default value for
      <code class="varname">cache_rows</code> is 1000 and can be changed by setting the
      default database property .The <code class="varname">cache_size</code> property
      sets the maximum amount of memory used for each text table. The default
      is 100 KB. The properties can be set for individual text tables. These
      properties do not control the maximum size of each text table, which can
      be much larger. An example is given below:</p>
<pre class="programlisting">    SET TABLE mytable SOURCE "myfile;ignore_first=true;all_quoted=true;cache_rows=10000;cache_size=1000"</pre>
<p>The properties used in earlier versions, namely the
      <code class="varname">textdb.cache_scale</code> and the
      <code class="varname">textdb.cache_size_scale</code> can still be used.</p>
</div>
<div class="section" title="Disconnecting Text Tables">
<div class="titlepage">
<div>
<div>
<h3 class="title">
<a name="ttc_disconnect"></a>Disconnecting Text Tables</h3>
</div>
</div>
</div>
<p>Text tables may be <em class="glossterm">disconnected</em> from their
      underlying data source, i.e. the text file.</p>
<p>You can explicitly disconnect a text table from its file by
      issuing the following statement: </p>
<pre class="programlisting">    SET TABLE mytable SOURCE OFF</pre>
<p>Subsequently, <code class="literal">mytable</code> will be empty and
      read-only. However, the data source description will be preserved, and
      the table can be re-connected to it with </p>
<pre class="programlisting">    SET TABLE mytable SOURCE ON</pre>
<p>When a database is opened, if the source file for an existing text
      table is missing the table remains disconnected from its data source,
      but the source description is preserved. This allows the missing source
      file to be added to the directory and the table re-connected to it with
      the above command.</p>
<p>Disconnecting text tables from their source has several uses.
      While disconnected, the text source can be edited outside HSQLDB
      provided data integrity is respected. When large text sources are used,
      and several constraints or indexes need to be created on the table, it
      is possible to disconnect the source during the creation of constraints
      and indexes and reduce the time it takes to perform the
      operation.</p>
</div>
</div>
<div class="section" title="Text File Usage">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both">
<a name="ttc_issues"></a>Text File Usage</h2>
</div>
</div>
</div>
<p>The following information applies to the usage of text
    tables.</p>
<div class="itemizedlist" title="Text File Issues">
<p class="title">
<b>Text File Issues</b>
</p>
<ul class="itemizedlist" type="disc">
<li class="listitem">
<p>File locations are restricted to below the directory that
        contains the database, unless the
        <code class="varname">textdb.allow_full_path</code> property is set true as a
        Java system property. This feature is for security, otherwise an admin
        database user may be able to open random files. The specified text
        source path is interpreted differently according to this property. By
        default, the path is interpreted as a relative path to the directory
        path of database files, it therefore cannot contain the double dot
        notation for parent directory. This path is then appended by the
        engine to the directory path to form a full path. When the property is
        true, the path is not appended to the directory path and is used as it
        is to open the file. In this usage the path can be relative or
        absolute.</p>
</li>
<li class="listitem">
<p>All-in-memory databases can use text tables. In this usage, the
        path must be an absolute path. These text tables are always read only.
        To disable this capability for access control reasons, the
        <code class="varname">textdb.allow_full_path</code> property can be set false as
        a Java system property.</p>
</li>
<li class="listitem">
<p>Blank lines are allowed anywhere in the text file, and are
        ignored.</p>
</li>
<li class="listitem">
<p>It is possible to define a primary key, identity column, unique,
        foreign key and check constraints for text tables.</p>
</li>
<li class="listitem">
<p>When a table source file is used with the<code class="literal">
        ignore_first=true </code>option, the first, ignored line is
        replaced with a blank line after a SHUTDOWN COMPACT, unless the SOURCE
        HEADER statement has been used.</p>
</li>
<li class="listitem">
<p>An existing table source file may include CHARACTER fields that
        do not begin with the quote character but contain instances of the
        quote character. These fields are read as literal strings.
        Alternatively, if any field begins with the quote character, then it
        is interpreted as a quoted string that should end with the quote
        character and any instances of the quote character within the string
        is doubled. When any field containing the quote character or the
        separator is written out to the source file by the program, the field
        is enclosed in quote character and any instance of the quote character
        inside the field is doubled.</p>
</li>
<li class="listitem">
<p>Inserts or updates of CHARACTER type field values are allowed
        with strings that contains the linefeed or the carriage return
        character. This feature is disabled when both quoted and all_quoted
        properties are false.</p>
</li>
<li class="listitem">
<p>ALTER TABLE commands that add or drop columns or constraints
        (apart from check constraints) are not supported with text tables that
        are connected to a source. First use the SET TABLE &lt;name&gt; SOURCE
        OFF, make the changes, then turn the source ON.</p>
</li>
</ul>
</div>
</div>
<div class="section" title="Text File Global Properties">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both">
<a name="ttc_global_props"></a>Text File Global Properties</h2>
</div>
</div>
</div>
<p>The database engine uses a set of defaults for text table
    properties. Each table's data source may override these defaults. It is
    also possible to override the defaults globally, so they apply to all text
    tables. The statement SET DATABASE TEXT TABLE DEFAULTS &lt;properties
    string&gt; can be used to override the default global properties. An
    example is given below:</p>
<pre class="programlisting">    SET DATABASE TEXT TABLE DEFAULTS 'all_quoted=true;encoding=UTF-8;cache_rows=10000;cache_size=2000'</pre>
<div class="itemizedlist" title="List of supported global properties">
<p class="title">
<b>List of supported global properties</b>
</p>
<ul class="itemizedlist" type="disc">
<li class="listitem">
<p>
<code class="literal">fs=,</code>
</p>
</li>
<li class="listitem">
<p>
<code class="literal">vs=,</code>
</p>
</li>
<li class="listitem">
<p>
<code class="literal">quoted=false</code>
</p>
</li>
<li class="listitem">
<p>
<code class="literal">all_quoted=false</code>
</p>
</li>
<li class="listitem">
<p>
<code class="literal">ignore_first=false</code>
</p>
</li>
<li class="listitem">
<p>
<code class="literal">encoding=ASCII</code>
</p>
</li>
<li class="listitem">
<p>
<code class="literal">cache_rows=1000</code>
</p>
</li>
<li class="listitem">
<p>
<code class="literal">cache_size=100</code>
</p>
</li>
<li class="listitem">
<p>
<code class="literal">textdb.allow_full_path=false (a system
        property)</code>
</p>
</li>
</ul>
</div>
</div>
<div class="section" title="Transactions">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both">
<a name="trc_transactions"></a>Transactions</h2>
</div>
</div>
</div>
<p>Text tables fully support transactions. New or changed rows that
    have not been committed are not updated in the source file. Therefore the
    source file always contains committed rows.</p>
<p>However, text tables are not as resilient to machine crashes as
    other types of tables. If the crash happens while the text source is being
    written to, the text source may contain only some of the changes made
    during a committed transaction. With other types of tables, additional
    mechanisms ensure the integrity of the data and this situation will not
    arise.</p>
</div>
</div>
<HR xmlns:xi="http://www.w3.org/2001/XInclude">
<P xmlns:xi="http://www.w3.org/2001/XInclude" class="svnrev">$Revision: 4864 $</P>
<div class="navfooter">
<hr>
<table summary="Navigation footer" width="100%">
<tr>
<td align="left" width="40%"><a accesskey="p" href="databaseobjects-chapt.html"><img src="../images/db/prev.png" alt="Prev"></a>&nbsp;</td><td align="center" width="20%">&nbsp;</td><td align="right" width="40%">&nbsp;<a accesskey="n" href="accesscontrol-chapt.html"><img src="../images/db/next.png" alt="Next"></a></td>
</tr>
<tr>
<td valign="top" align="left" width="40%">Chapter&nbsp;4.&nbsp;Schemas and Database Objects&nbsp;</td><td align="center" width="20%"><a accesskey="h" href="index.html"><img src="../images/db/home.png" alt="Home"></a></td><td valign="top" align="right" width="40%">&nbsp;Chapter&nbsp;6.&nbsp;Access Control</td>
</tr>
</table>
</div>
</body>
</html>
